Skip to content

[ENH] V1 → V2 API Migration - studies#1610

Open
rohansen856 wants to merge 82 commits intoopenml:mainfrom
rohansen856:studies-migration
Open

[ENH] V1 → V2 API Migration - studies#1610
rohansen856 wants to merge 82 commits intoopenml:mainfrom
rohansen856:studies-migration

Conversation

@rohansen856
Copy link
Contributor

Metadata

Details

Stackend PR, Depends on #1576

This PR adds Studies v2 migration.

A question:
Due to the pre commit hook i could not put 6 arguments in a function, so i had to workaround that with this instead:
openml_api\resources\studies.py (line 10-15)

        limit = kwargs.get("limit")
        offset = kwargs.get("offset")
        status = kwargs.get("status")
        main_entity_type = kwargs.get("main_entity_type")
        uploader = kwargs.get("uploader")
        benchmark_suite = kwargs.get("benchmark_suite")

I would like to confirm if this approach is correct or not. Raising a draft PR for now.

@codecov-commenter
Copy link

codecov-commenter commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 50.22693% with 329 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.06%. Comparing base (d421b9e) to head (18dc72a).

Files with missing lines Patch % Lines
openml/_api/clients/http.py 24.46% 142 Missing ⚠️
openml/_api/resources/base/versions.py 24.71% 67 Missing ⚠️
openml/_api/resources/study.py 25.00% 33 Missing ⚠️
openml/_api/runtime/core.py 55.38% 29 Missing ⚠️
openml/_api/resources/base/fallback.py 26.31% 28 Missing ⚠️
openml/testing.py 48.71% 20 Missing ⚠️
openml/_api/config.py 95.45% 3 Missing ⚠️
openml/_api/resources/base/base.py 76.92% 3 Missing ⚠️
openml/study/functions.py 50.00% 2 Missing ⚠️
openml/_api/__init__.py 88.88% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1610      +/-   ##
==========================================
+ Coverage   52.04%   52.06%   +0.02%     
==========================================
  Files          36       58      +22     
  Lines        4333     4965     +632     
==========================================
+ Hits         2255     2585     +330     
- Misses       2078     2380     +302     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@geetu040 geetu040 mentioned this pull request Jan 9, 2026
25 tasks
@rohansen856
Copy link
Contributor Author

Implementing noqa instead of the kwargs following example from here: openml\testing.py:

    def _check_fold_timing_evaluations(  # noqa: PLR0913
        self,
        fold_evaluations: dict[str, dict[int, dict[int, float]]],
        num_repeats: int,
        num_folds: int,
        *,
        max_time_allowed: float = 60000.0,
        task_type: TaskType = TaskType.SUPERVISED_CLASSIFICATION,
        check_scores: bool = True,
    ) -> None:

Final function signature:

    def list(  # noqa: PLR0913
        self,
        limit: int | None = None,
        offset: int | None = None,
        status: str | None = None,
        main_entity_type: str | None = None,
        uploader: list[int] | None = None,
        benchmark_suite: int | None = None,
    ) -> Any:

Signed-off-by: rohansen856 <rohansen856@gmail.com>
@rohansen856 rohansen856 marked this pull request as ready for review January 13, 2026 07:21
Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. Just use the listing as suggested in #1575 (comment) which is already similar to what you have done.

@rohansen856
Copy link
Contributor Author

@geetu040 I reviewed the specific changes needed and have a slight doubt in the pandas implementation.
So as i undertand, i need to use pandas Dataframe insteaf of ANY in openml\_api\resources\base.py like this:

class StudiesAPI(ResourceAPI, ABC):
    @abstractmethod
    def list(  # noqa: PLR0913
        self,
        limit: int | None = None,
        offset: int | None = None,
        status: str | None = None,
        main_entity_type: str | None = None,
        uploader: list[int] | None = None,
        benchmark_suite: int | None = None,
    ) -> pd.DataFrame: ...

and similarly i have to change the return object in openml\_api\resources\studies.py from this:return response.text
to this:

xml_string = response.text

        # Parse XML and convert to DataFrame
        study_dict = xmltodict.parse(xml_string, force_list=("oml:study",))

        # Minimalistic check if the XML is useful
        assert isinstance(study_dict["oml:study_list"]["oml:study"], list), type(
            study_dict["oml:study_list"],
        )
        assert (
            study_dict["oml:study_list"]["@xmlns:oml"] == "http://openml.org/openml"
        ), study_dict["oml:study_list"]["@xmlns:oml"]

        studies = {}
        for study_ in study_dict["oml:study_list"]["oml:study"]:
            # maps from xml name to a tuple of (dict name, casting fn)
            expected_fields = {
                "oml:id": ("id", int),
                "oml:alias": ("alias", str),
                "oml:main_entity_type": ("main_entity_type", str),
                "oml:benchmark_suite": ("benchmark_suite", int),
                "oml:name": ("name", str),
                "oml:status": ("status", str),
                "oml:creation_date": ("creation_date", str),
                "oml:creator": ("creator", int),
            }
            study_id = int(study_["oml:id"])
            current_study = {}
            for oml_field_name, (real_field_name, cast_fn) in expected_fields.items():
                if oml_field_name in study_:
                    current_study[real_field_name] = cast_fn(study_[oml_field_name])
            current_study["id"] = int(current_study["id"])
            studies[study_id] = current_study

        return pd.DataFrame.from_dict(studies, orient="index")

A total of 3 files would be affected: openml\_api\resources\base.py, openml\_api\resources\studies.py and openml\study\functions.py

Can you please confirm my approach... After that i will update the PR.

@geetu040
Copy link
Collaborator

@rohansen856 yes sounds right

@rohansen856
Copy link
Contributor Author

Updated! Ready for review.

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost fine, just complety remove _list_studies as well and replace _list_studies with api_context.backend.studies.list as the parameter for partial in list_studies. Hope I didnot confuse you, just search for the exact method names in code. Let me know if I am not clear enough.

@rohansen856
Copy link
Contributor Author

rohansen856 commented Jan 16, 2026

Almost fine, just complety remove _list_studies as well and replace _list_studies with api_context.backend.studies.list as the parameter for partial in list_studies. Hope I didnot confuse you, just search for the exact method names in code. Let me know if I am not clear enough.

Oh definitely! I prolly missed that in openml\study\functions.py but pushing the change with next commit.

…list

Signed-off-by: rohansen856 <rohansen856@gmail.com>


class StudyV1API(ResourceV1API, StudyAPI):
def list( # noqa: PLR0913
Copy link
Contributor

@EmanAbdelhaleem EmanAbdelhaleem Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can split this into 3 functions for more readability:

  • list()
  • _build_url()
  • _parse_list_xml()

check #1606 for reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood! will deparate the long list function into the said 3 functions with proper docstring. applying with next commit.

@@ -0,0 +1,94 @@
# License: BSD 3-Clause
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to change the file name to "test_study" for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! applying with next commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in this case similarly, tests\test_study folder should be renamed to tests\test_studies.
cc @geetu040

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, but let's not do it here, that will make the file hard to review with visible changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

assert all(studies_df["status"] == "active")

@pytest.mark.uses_test_server()
def test_list_pagination(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to test pagination here. These tests should only be specific for the API. It's better to leave this test on test_study_functions if it's there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is actually no pagination test in test_study_functions. Implementing this here should be fine... LMK if do u think we need to remove it still...


def setUp(self) -> None:
super().setUp()
self.api = StudyV2API(self.http_client)
Copy link
Contributor

@EmanAbdelhaleem EmanAbdelhaleem Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is v2, you need to use

self.v2_client = self._get_http_client(
            server="http://localhost:8001/",
            base_url="",
            api_key="",
            timeout_seconds=self.timeout_seconds,
            retries=self.retries,
            retry_policy=self.retry_policy,
            cache=self.cache,
        )

and change the server to your local v2 server

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood!
replacing this:

self.api = StudyV2API(self.http_client)

with this:

self.v2_client = self._get_http_client(
            server="http://localhost:8001/",
            base_url="",
            api_key="",
            timeout=self.timeout,
            retries=self.retries,
            retry_policy=self.retry_policy,
            cache=self.cache,
)
self.api = StudyV2API(self.v2_client)

self.v2_api = StudyV2API(self.http_client)

@pytest.mark.uses_test_server()
def test_v1_v2_compatibility(self):
Copy link
Contributor

@EmanAbdelhaleem EmanAbdelhaleem Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should test that the output matches and follow the naming style mentioned here: #1575 (comment)

check #1603 for reference

# Both should have delete, tag, untag from base
for method in ["delete", "tag", "untag", "publish"]:
assert hasattr(self.v1_api, method)
assert hasattr(self.v2_api, method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to add Fallback tests as mentioned here: #1575 (comment)

check #1603 for reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understoo! will implement FallbackProxy and a test_list_fallback function that tests the FallbackProxy automatically falls back from V2 to V1 when V2 raises not supported. also in case of test_list_matches i think it should be marked with @pytest.mark.skip(reason="V2 list not yet implemented") as it currently throws OpenMLNotSupportedError...

Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
@rohansen856 rohansen856 marked this pull request as ready for review February 5, 2026 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants